About the Data

This data shows air pollution at each census tract. It specifically focuses on concentrations of PM2.5, meaning fine particulate matter that is less than 2.5 micrometers in diameter. PM2.5 concentrations are measured by the number of micrograms per cubic meter.

Variable Descriptions

glimpse(airquality)
## Rows: 42
## Columns: 9
## $ CTIDFP00        <dbl> 51003010100, 51003010200, 51003010300, 51003010400, 51…
## $ COUNTYFP00      <chr> "003", "003", "003", "003", "003", "003", "003", "003"…
## $ NAME00          <dbl> 101.00, 102.00, 103.00, 104.00, 105.00, 106.00, 107.00…
## $ PM2_5_1981      <dbl> 22.42496, 24.45130, 24.73297, 24.61517, 24.50003, 25.9…
## $ percentile_1981 <dbl> 43.03349, 54.89164, 56.49551, 55.79516, 55.17332, 63.5…
## $ PM2_5_2016      <dbl> 6.137776, 7.016078, 7.122480, 6.959763, 6.928336, 7.45…
## $ percentile_2016 <dbl> 21.99852, 37.59851, 40.22134, 36.26247, 35.50055, 49.5…
## $ PM_change       <dbl> -16.28718, -17.43522, -17.61049, -17.65541, -17.57169,…
## $ pctile_change   <dbl> -21.03497, -17.29313, -16.27417, -19.53269, -19.67276,…

Observations are census tract estimates of:

  • PM2.5 levels in 1981 and 2016 (PM2_5_1981 and PM2_5_2016)
  • Percentile rankings in 1981 and 2016 (percentile_1981 and percentile_2016)
  • Change in PM2.5 level between 1981 and 2016 (PM_change)
  • Change in percentile rank between 1981 and 2016 (pctile_change)

Summaries

Five-number summaries of all variables:

airquality %>% select(-c(CTIDFP00:NAME00)) %>% 
  select(where(~is.numeric(.x) && !is.na(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 0,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## ============================================
## Statistic       Mean St. Dev. Min Median Max
## --------------------------------------------
## PM2_5_1981       26     2     21    25   32 
## percentile_1981  61     13    36    59   92 
## PM2_5_2016       7      1      6    7     8 
## percentile_2016  45     16    18    42   74 
## PM_change       -18     2     -23  -18   -15
## pctile_change   -16     3     -22  -16   -9 
## --------------------------------------------

Visual Distributions

Visual representations of the data:

airquality %>% select(CTIDFP00, percentile_1981, percentile_2016) %>% 
  pivot_longer(-CTIDFP00, names_to = "measure", values_to = "value") %>% 
  ggplot(aes(x = value, fill = measure)) + 
  geom_histogram() + 
  facet_wrap(~measure, scales = "free") +
  xlim(0,100)

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_1981, y=percentile_2016)) +
  xlim(0, 100) +
  ylim(0, 100)

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_1981, y=pctile_change)) +
  xlim(0, 100) 

airquality %>% 
  ggplot() +
  geom_point(aes(x=percentile_2016, y=pctile_change)) +
  xlim(0, 100) 

Spatial Distributions

Percentile in 1981

pal <- colorNumeric("plasma", reverse = TRUE, domain = cvilleshapes$percentile_1981)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(percentile_1981),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$NAME00, "<br>",
                             "Percentile: ", round(cvilleshapes$percentile_1981, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$percentile_1981,
            title = "Percentile, 1981", opacity = 0.7)

Percentile in 2016

pal <- colorNumeric("plasma", reverse = TRUE, domain = cvilleshapes$percentile_2016)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(percentile_2016),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$NAME00, "<br>",
                             "Percentile: ", round(cvilleshapes$percentile_2016, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$percentile_2016,
            title = "Percentile, 2016", opacity = 0.7)

Percentile Change, 1981-2016

pal <- colorNumeric("plasma", reverse = TRUE, domain = cvilleshapes$pctile_change)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(pctile_change),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$NAME00, "<br>",
                             "Percentile Change: ", round(cvilleshapes$pctile_change, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$pctile_change,
            title = "Percentile Change, 1981-2016", opacity = 0.7)

Change in PM2.5

pal <- colorNumeric("plasma", reverse = TRUE, domain = cvilleshapes$PM_change)
leaflet(cvilleshapes) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = cvilleshapes,
              fillColor = ~pal(PM_change),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0("Tract Number: ", cvilleshapes$NAME00, "<br>",
                             "PM2.5 Change: ", round(cvilleshapes$PM_change, 2))) %>%
  addLegend("bottomright", pal = pal, values = cvilleshapes$PM_change,
            title = "Change in PM2.5, 1981-2016", opacity = 0.7)

Important Notes

The census tracts listed in this data are from the 2000 census. This makes the spatial data harder to visualize, since our shapefiles come from the 2010 census boundaries. The census tracts that are different are:

  • 102 (Albemarle): in the 2010 census, this should be 102.01 and 102.02.
  • 104 (Albemarle): in the 2010 census, this should be 104.01 and 104.02.
  • 106 (Albemarle): in the 2010 census, this should be 106.01 and 106.02.
  • 112 (Albemarle): in the 2010 census, this should be 112.01 and 112.02.
  • 113 (Albemarle): in the 2010 census, this should be 113.01, 113.02, and 113.03.
  • 201 (Fluvanna): in the 2010 census, this should be 201.01 and 201.02.
  • 301 (Greene): in the 2010 census, this should be 301.01 and 301.02.
  • 9502 (Louisa): in the 2010 census, this should be 9502.01 and 9502.02.
  • 9504 and 9505 (Louisa): these exist in the 2010 census but are not present in this data.
  • 1 (Charlottesville): this tract does not exist in the 2010 census, but is present in this data.
  • 3.01 (Charlottesville): this tract does not exist in the 2010 census, but is present in this data.
  • 10 (Charlottesville): this tract exists in the 2010 census but is not present in this data.